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Emulating the input-output functions performed by a brain structure opens the possibility 
for developing neuroprosthetic systems that replace damaged neuronal circuits. Here, we 
demonstrate the feasibility of this approach by replacing the cerebellar circuit responsi- 
ble for the acquisition and extinction of motor memories. Specifically, we show that a rat 
can undergo acquisition, retention, and extinction of the eye-blink reflex even though the 
biological circuit responsible for this task has been chemically inactivated via anesthesia. 
This is achieved by first developing a computational model of the cerebellar microcircuit 
involved in the acquisition of conditioned reflexes and training it with synthetic data gener- 
ated based on physiological recordings. Secondly, the cerebellar model is interfaced with 
the brain of an anesthetized rat, connecting the model's inputs and outputs to afferent and 
efferent cerebellar structures. As a result, we show that the anesthetized rat, equipped 
with our neuroprosthetic system, can be classically conditioned to the acquisition of an 
eye-blink response. However, non-stationarities in the recorded biological signals limit the 
performance of the cerebellar model. Thus, we introduce an updated cerebellar model and 
validate it with physiological recordings showing that learning becomes stable and reli- 
able. The resulting system represents an important step toward replacing lost functions 
of the central nervous system via neuroprosthetics, obtained by integrating a synthetic 
circuit with the afferent and efferent pathways of a damaged brain region. These results 
also embody an early example of science-based medicine, where on the one hand the 
neuroprosthetic system directly validates a theory of cerebellar learning that informed the 
design of the system, and on the other one it takes a step toward the development of 
neuro-prostheses that could recover lost learning functions in animals and, in the longer 
term, humans. 

Keywords: neuroprosthetics, cerebellum, classical conditioning, timing, memory, association learning, nucleo- 
olivary pathway, inferior olive 



INTRODUCTION 

Neural prostheses between the central nervous system and periph- 
eral systems have a relatively recent development history. Some 
examples are retinal and cochlear implants (Eddington et al, 1978; 
Wilson et al, 1991; Zrenner, 2002; Cohen, 2007), and brain com- 
puter interface systems controlling artificial limbs (Chapin et al, 
1999; Schwartz et al., 2006; Moritz et al, 2008; Hochberg et al, 
2012). However, the bi-directional coupling of a prosthetic sys- 
tem with the central nervous system has only very recently been 
addressed ( Giovannucci et al, 2010; Berger et al., 2011; Bamford 
et al, 2012). Here we demonstrate the functional bi-directional 
coupling of an artificial system and the central nervous system 
in the context of classical conditioning. Classical conditioning is 
one of the most essential forms of associative learning (Pavlov and 
Anrep, 1927). In classical conditioning, an initially neutral Con- 
ditioned Stimulus (CS - see Table Al in the Appendix for the 



list of abbreviations) precedes an aversive or appetitive Uncondi- 
tioned Stimulus (US), leading to the acquisition of a Conditioned 
Response (CR). A widely employed paradigm in classical con- 
ditioning is eye -blink reflex conditioning, where an animal is 
exposed to a CS, e.g., a tone, followed after a certain inter- stimulus 
interval (ISI) by an aversive US to the eye or periorbital area, e.g., an 
air-puff (Kehoe and Macrae, 2002). After repeated paired stimulus 
presentations, the animal closes the eyeHds in anticipation of the 
expected air-puff; this anticipatory action is known as the condi- 
tioned response. If a conditioned animal is subsequently exposed 
to tones not followed by the air-puff US (CS-alone stimulation or 
extinction training), the previously acquired associative CR disap- 
pears and the CS reacquires its initial neutral nature. Remarkably, 
if we repeat the initial training after extinction, the CRs are more 
rapidly acquired, a phenomenon known as savings (Napier et al., 
1992). 
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FIGURE 1 I Biological microcircuit and synthetic counterpart. Recording 
(PN and 10) and stimulation sites are shown. After amplification and filtering 
of the signals recorded in the afferent structures, discrete events retrieved 
from multi unit activity are isolated by the event detection stages of the 
system, such that they are fed to their counterparts in the synthetic 
cerebellum (PN and 10). In the intact circuit, the repeated coincidence of CS 
and US signals within the cerebellar cortex induces plasticity causing the 
cerebellum to respond to the CS with a CR. In our model, once such a CR is 
acquired, it is relayed via the synthetic DN to the facial nucleus (FN) of the 
rat as an electrical stimulation that causes the animal to trigger the 
behavioral CR, i.e., the eye-blink. In addition, within the model, the CR 
triggered by the DN inhibits the 10, preventing a US-derived signal from 
reaching the cerebellum once a protective action has already been issued. 
Since anesthesia prevents acquisition in the rodent's cerebellum, behavioral 
CRs expressed in the set up studied here are controlled by the synthetic 
circuit. 



The cerebellum is critical for the acquisition of CRs in eye -blink 
conditioning (Hesslow and Yeo, 2002; Christian and Thompson, 
2003). The CS signal reaches the cerebellum through the mossy 
fibers originating in the Pontine Nuclei (PN), while the US sig- 
nal is projected through the climbing fibers originating in the 
Inferior Olive (lO). These two projections converge onto the cere- 
bellar Purkinje cells that control through dis-inhibition of the deep 
nuclear cells. Deep nuclear neurons synapse with the motor neu- 
rons responsible for the production of CRs. Purkinje cells, the sole 
output of the cerebellar cortex, thus indirectly control the motor 
neurons with an inverse relationship, they drive CRs by learning to 
timely reduce their activity in presence of the CS (Jirenhed et al., 
2007). These areas of the cerebellar cortex, cerebellar nuclei, and 
lO regulating the acquisition of conditioned eye-blinks constitute 
one of the many cerebellar microcircuits, which are considered 
the elementary and parallel computational units that form the 
cerebellum (see Figure 1). 

Based on these assumptions, we have previously devel- 
oped a computational model of learning in the cerebellum 
(Verschure and Mintz, 2001; Hofstotter et al, 2002) that was 
implemented in a VLSI hardware and tested in a robot learn- 
ing task (Hofstotter et al, 2004). Here, we show how an updated 
version of this computational model can be deployed as a pro- 
totype of a neuroprosthetic device and interfaced with the brain 
of a living animal, replacing the function of the animal's inacti- 
vated cerebellum. This experiment is one step in a larger set of 
experiments to validate the Distributed Adaptive Control theory 
of mind and brain (DAC) that identifies the cerebellum as a key site 
for the shaping of self- generated states including discrete actions 
observed in classical conditioning (Verschure, 2012). 

In this case, the computational model is fed not with the artifi- 
cial signals generated in the robot experiments, but with biosignals 
acquired through electrodes. Biosignals, unlike the robot input sig- 
nals, are inherently stochastic both because of recording noise and 
inherent spiking fluctuations. For instance, the climbing fibers 
(lO) spiking activity is known to have a 1 Hz baseline activity. 
Indeed, the features of the experimental recordings presented here 
match the known lO physiology, showing a baseline firing rate 
of 0.5-2 Hz (De Zeeuw et al, 1998). Therefore, the encoding of 
the US in the lO will be very noisy, and will prevent learning 
if we naively use the model in Hofstotter et al. (2004). Hence, 
our first goal is to update such model in order to cope with 
biosignals. 

Our second goal is to vaHdate the algorithms designed for 
interfacing and operating a neuroprosthetic system in an in vivo 
bio-hybrid preparation. The system under evaluation (Figure 1) 
implements a real-time model of cerebellar learning that is driven 
by signals recorded directly from the PN and lO, detecting CS and 
US events from these recording channels, respectively. The out- 
put of the neuroprosthetic system is linked to a microstimulator 
targeting the facial nucleus (FN), where proper stimulation would 
evoke well-timed CRs, with latencies matched to the biological 
circuit being replaced (see Prueckl et al., 2011a for specifics on 
the physiology of this experiment). Since in our preparation, the 
acquisition of natural CRs is precluded by anesthesia, any overt 
CRs observed in the experiment are the result from associative 
learning occurring solely within the synthetic system. 



Finally, we address a learning stability issue emerged during the 
in vivo testing phase: we detected a non-stationarity in the level of 
spontaneous activity in the lO channel, and quantified its impact 
on the model performance. Even though learning can still take 
place, learning stability is hindered and the possibility of chronic 
implantation is precluded. We thus implemented a variation of the 
cerebellar algorithm that is robust under slow non-stationarities, 
in our case slow fluctuations in the lO activity. 

We beheve that our approach defines a specific paradigm for 
the generation of neuroprosthetic systems that evolves following 
four steps: (1) identify the input and output structures and their 
encoding, (2) identify the anatomical and physiological princi- 
ples underlying the computations performed by the target system, 
(3) integrate steps 1 and 2 with the appropriate signal processing 
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in a single device, and (4) miniaturize the neuroprosthetic sys- 
tem while optimizing its power consumption. Here, we emphasize 
steps 1-3 since we already previously have demonstrated step 4 
(Hofstotter et al, 2004; Bamford et al, 2012). 

In summary, with this work we sought to provide further 
evidence for the fundamental principle underlying our model; 
namely, that the activity of the lO constitutes a teaching signal that 
controls the acquisition or extinction of CRs, and that by regulat- 
ing the lO activity, the nucleo- olivary inhibition (NOI) stabilizes 
the CRs during paired CS-US training and drives extinction dur- 
ing CS-alone stimulation. Our results provide such evidence, and 
additionally, they demonstrate at the design level the possibility of 
realizing long-term, noise-resistant implantable low power neuro- 
prosthesis supporting the acquisition, retention, and extinction of 
novel behavior even if the biological substrate has lost its learning 
capability due to trauma or aging. 

MATERIALS AND METHODS 
CEREBELLAR MODEL 
Latencies 

It is well-known in the domain of control theory that the latencies 
and delays inherent in a system to be controlled play an important 
role in the design of the controller. Here, our controller is based on 
the cerebellar microcircuit involved in eye-blink conditioning. In 
nature, such a microcircuit must have internalized the latencies to 
the eye-blink system in several ways, one of them arguably being 
through the unusually long latency of the NOI (Hesslow, 1986) 
that we had previously interpreted as allowing for the matching 
of the system delays (Hofstotter et al, 2002). Informally, once an 
error signal reaches the lO, such a delay indicates how far ahead of 
it the cerebellum should have taken a protective action for it to be 
effective. Consistently with this view, in the computational model 
that we employ the latency between the activation of the deep 



nucleus (DN) and the onset of the inhibition of the lO (the NOI 
delay, Anoi) sets the anticipation of the CR execution relative to the 
expected US arrival (Hofstotter et al, 2002) (see Figure 2). There- 
fore, we will first discuss the latencies associated with the task of 
classical conditioning, since their specific properties underlie the 
cerebellar computational model. 

Concretely, setting a functional delay for the NOI requires 
knowledge of the transmission or mechanical latencies involved 
in the task, otherwise the internal timing of the CR and US signals 
would result in non-adaptive CRs coming too late or too early 
with respect to the US air-puff. In other words, for the blink to 
anticipate the air-puff, Anoi has to exceed the sum of the sensory 
latency between the air-puff reaching the cornea and the US detec- 
tion (o^us) plus the effector latency between the CR triggering by 
the DN and the effective eyelid closure (cocr): 



A, 



: <^CR + <^us- 



(1) 



In the literature, this sum of afferent plus efferent latencies is 
referred to as the delay of the error feedback (Miall et al, 1993). 
By setting Anoi to this minimal latency, the CR and the US onsets 
will match. However, to achieve a better protection form the US, 
the best temporal arrangement of CR and US is that of the US 
onset coinciding with the middle of the CR. Given that we elicit 
the CR by an electrical stimulation lasting 150 ms (Prueckl et al., 
201 lb), such a temporal arrangement is achieved adding 75 ms to 
the minimal latency in Eq. (1). 

On the other hand, the sum of the latency between the onset of 
the CS delivery and its detection (cocs) plus the latency between 
the FN stimulation and the CR execution (cocr) affects the optimal 
internal timing (^cr) that the model has to acquire for a given ISI: 



tcR = ISI - (o^cs + <^cr). 



(2) 
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FIGURE 2 I Intrinsic latencies of the eye-blink conditioning 
preparation. (A) ISI, inter-stimulus interval; cocs, latency between the 
peripheral CS stinnulation and the detection of its associated neuronal 
response in the PN; fcR, internal response timing learned by the model 
between the CS detection and the CR triggering; ^cr, latency between the 



neuronal triggering of the CR and the effective eyelid closure, Anoi, delay 
between the CR trigger and the onset of the negative feedback loop 
inhibition; ojus, latency between the US-trigger and the detection of its 
associated neural response in the 10. (B) Same latencies as in (A) for the 
minimum learnable ISI. 
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This time interval is shorter than the external IS! since it 
accounts for the detection and execution latencies. Note that 
diminishing tcR toward 0, we get the minimum ISI that is 
learnable by the model, where CS detection immediately triggers 
aCR: 

minisi = cocs + <^CR- 

For an ISI shorter than minjsi, a CR initiated by the cerebellum 
will always come after the US. For this reason, we will design a 
controller that only acquires CRs whenever the ISI exceeds this 
value. 

Computational model 

In what follows, we summarize the biological model based on 
Vprqrhnrp nnd Mintz (2001) and Hofstotter et al. (2002) and 
upgraded to cope with biosignals. Our model is based on the 
following assumptions: 

1. the cerebellum is the brain area principally involved in the 
acquisition of a CR in the delay classical conditioning paradigm; 

2. the only inputs received by the cerebellum are the mossy 
fibers, carrying CS-related information, and the climbing fibers, 
carrying US -related information; 

3. the mechanism responsible for the acquisition of a condi- 
tioned response is plasticity at the parallel fiber to Purkinje 
cell synapses; 

4. such plasticity is induced by the stimulation of parallel fiber, 
alone (long-term potentiation - LTP) or jointly with climbing 
fiber (long-term depression - LTD); 

5. lO, cerebellar cortex, and DN are organized in distinct micro- 
complexes, which constitute negative feedback loops over 
lO; 

6. the timing of the CR is adapted to the length of the ISI by these 
olivo-cortico-nuclear feedback loops that control the plasticity 
at parallel fiber-Purkinje cell synapses by gating the climbing 
fiber error signal; 

7. the training procedure leads to a pause in Purkinje cell activity 
following CS presentation; 

8. a CR is triggered by dis- inhibition of the deep nucleus by the 
cessation of Purkinje cell firing; 

9. Purkinje cells operate in two distinct modes: a spontaneous and 
a CS-driven mode. Informally, the Purkinje cell is always main- 
tained active during spontaneous activity of the input parallel 
fiber. However, during a CS presentation, Purkinje cells switch 
to a decaying activity. For a detailed explanation see Hofstotter 
etal. (2004). 

Here, in order to deploy the cerebellar model on a low power 
a VLSI platform, we generated a computational model function- 
ally equivalent to previous versions (Verschu 2001; 
Hofstotter et al, 2002, 2004) albeit more abstract from an 
anatomical standpoint to ensure computational efficiency. 

Process descriptions. The trace generation, scaling, and thresh- 
olding processes (1, 2, and 3 in Figure 3) model the processing of 
information that enters the cerebellum via the mossy fibers and 



leaves it through the excitatory axons of deep nuclear cells that 
projects to red nucleus which, in turn, excites FN (Hesslow -^"^ 
Yeo, 2002; Christian and Thompson, 2003). The trace generation 
( 1 ) process codes the time since the CS onset with a decaying trace 
having a fixed initial value (to), final value (ti), and duration 
(A^). This trace defines the memory span of the system; i.e., the 
maximum temporal gap between CS detection and a CR execu- 
tion learnable by the system. The Scaling (2) process multiplies the 
trace with the memory parameter w, which is the only parameter 
modified by learning. With w, we mimic the changes in synaptic 
efficacy that occur in the molecular layer, due to LTD in the paral- 
lel fiber to Purkinje cell synapse and/or other kinds of associative 
plasticity (Dean et al, 2010). Lastly, the thresholding (3) process 
triggers a CR whenever the value of the scaled trace falls below 
a decision threshold (^cr)- Within this process, we collapse all 
the transductions that occur postsynaptically from the Purkinje 
cells down to the efferents of the deep nuclei. In short, these three 
processes map event detections in the PN into stimulation of the 
rat FN. The parameter w regulates the mapping and, by scaling the 
trace signal, controls whether a response is triggered or not, and 
if so, determines its timing in a way analogous to the biological 
system. 

Three processes control the negative feedback loop that stabi- 
lizes learning: inhibitory pulse (4), delay (5), and gating (6). The 
role of the negative feedback in classical conditioning is to prevent 
the error signal triggered by the US from reaching the cerebellar 
cortex if a CR has already been triggered (Medina et al., 2001; 
Verschure and Mintz, 2001). Processes 4 and 5 set the shape of the 
inhibitory square pulse. Its duration matches the duration of the 
CS trace, such that the lO can be inhibited for the whole duration 
of the CS trace. Process 4 delays the pulse by Anoi seconds. In prac- 
tice, the value used was on the order of 100 ms. The gating process 
(6) suppresses lO detections that co-occur with the inhibitory 
pulse. In summary, these components functionally reproduce the 
inhibitory control of the deep nuclear cells over the lO (Bengtsson 
and Hesslow, 2006). 

The last two components, delay (7) and coincidence detection 
(8), update the associative weight w, thereby controlling the learn- 
ing of the CR timing. Process delay delays the CS stimulus trace by 
Anoi seconds (same value introduced above). The resulting trace 
defines the temporal window wherein errors detected by lO can be 
prevented by the cerebellum. For instance, if a US is detected fol- 
lowing a CS but ahead of this temporal window, then the CS-US 
interval is too short and any CR initiated by the cerebellum after 
the CS detection could not avoid the US [see Equation (1)]. Like- 
wise, such a trace defines a so-called eligibility window wherein lO 
activity can be associated with a given PN detection. In short, the 
system has a minimum ISI of Anoi seconds, and a maximum ISI of 
Anoi + seconds. Lastly, the coincidence detection process (8) 
checks whether event detections in the CS and US pathways coin- 
cide and updates w accordingly. Namely, it decreases w every time 
an lO detection overlaps with the eligibility trace, and increases w 
if no lO detection occurs during that period. The function per- 
formed by these last two processes mimics the control of plasticity 
in the parallel fiber-Purkinje cell synapse (Wang et al., 2000; Safo 
and Regehr, 2008; Sarkisov and Wang, 2008). The initial value of 
w is set to wq. 
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FIGURE 3 I Functional model of the cerebellum. The processes in the top 
row (white boxes) map PN activity into action; in the case of eye-blink 
conditioning, tone detections into eye-blinks. Such mapping is controlled by 
the memory parameter w. The shaded processes adapt the mapping, namely. 



they are involved in the adjustment of w. The numbers identify specific 
processes. The latencies affecting each of the recording and stimulating 
channels as well as the parameters used in each process (see main text for an 
explanation). 



Calibration oftlie cerebellar model 

Definition of the optimization problem. We mentioned that the 
coincidence detection process in Figure 3 modulates the w para- 
meter thereby controlHng the acquisition and timing of CRs. In 
our implementation, the synaptic efficacy w is modified in lin- 
ear steps, namely, 5^ for depression and 3p for potentiation. The 
cerebellar model optimization consists of selecting the plasticity 
parameters 5 j and 8p that result in a desired learning behavior. We 
solve this optimization problem in two different scenarios: with 
synthetic data or with data directly obtained from the brain activ- 
ity of the animal. With the former, we assess the properties of the 
model, whereas with the latter it is applied in the bio-hybrid prepa- 
ration. In both cases, the data consist of a set of detections in both 
recording sites of the system (PN and lO) and each set might con- 
tain evoked- detections (caused by the CS or the US, respectively) 
of spontaneous events. We refer to the former as true detections 
(TDs) and to the second as false alarms (FAs). 

Informally, we impose that the learning dynamics of the system 
mirror the behavior: when CSs and USs are paired, the circuit 
should learn to produce CRs within tens of trials; when in a 
trained animal CSs are not paired with USs, the circuit should 
unlearn to produce CRs within tens of trials; all other conditions 
should not alter the circuit transfer function. More formally, the 
optimization problem is described by a linear system represent- 
ing three types of constraints: acquisition, extinction, and stability 
(see Table 1). 

Estimation of plasticity events, PN (CS) and lO (US) detected 
or artificially generated events are coded in binary vectors P and I, 
where each element is a time step and a value of 1 signals an event. 
The vector of eligibility traces (11, Box 1 in Figure 3) is obtained 



Table 1 | Stimulation conditions for the closed-loop experiment. 



ID Experimental condition 


Description 


1 Acquisition 


Paired CS-US presentation leading 




to acquisition of CRs 


2 Extinction 


CS-alone trials with CR leading to 




extinction of CRs 


3 Stability 


CS-alone or unpaired CS-US trials 




with no CR, causing no nnodification 




of the nnennory paranneter 


See text for further explanation. 

by convolving P with the eligibility trace waveform (e): 



n = p X €, 



where 6 is a rectangular pulse lasting seconds and delayed by 
Anoi seconds. Here, we fixed these values to 0.3 and 0.1 s, respec- 
tively. The first value is in good accordance with the maximum 
interval between CS and US bridged by the cerebellum in eye- 
blink conditioning (Moyer et al, 1990; Kalmbach et al, 2009), 
whereas the second matches the most effective interval between 
parallel fiber and climbing fiber stimulation for the induction of 
cerebellar LTD (Wang et al, 2000; Safo and Regehr, 2008; Sarkisov 
and Wang, 2008). 

Plasticity events occur under the conditions specified in Table 2. 
Firstly, a necessary condition for a potentiation or depression event 
to occur at a given time step {t) is that the eligibility trace is non- 
zero. Secondly, the number of potentiation events is P = ^ 11 for 
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Table 2 | Plasticity conditions. 



Eligibility trace vector 



US vector 



n(f) = i 
n(f)-i 
m) = o 

See text for further explanation. 



\{t) = ^ 

any \{t) 
any \{t) 



Plasticity 



Depression 
Potentiation 
No plasticity event 



potentiation occurs for every time step with a non-zero CS eUgi- 
bihty trace. Thirdly, depression occurs when US detection overlaps 
with the eligibility trace. Hence, the number of depression events 
can be obtained with the scalar product of 11 and I: 



the biological model, and for consistency with classical cerebellar 
learning theory, that links learning in the cerebellum with LTD in 
parallel fiber- Purkinje cell synapses (Albus, 1971; Ito et al., 1982), 
we suppose that CR acquisition requires depression (decrease) of 
the value of w and extinction, a potentiation (increase). Regarding 
the optimization algorithm, we weighted more the stability con- 
straint since it by itself guarantees the convergence of the learning 
dynamics, i.e., paired CS-US stimulation yields acquisition and 
CS-alone stimulation yields extinction. Informally, if under spon- 
taneous lO activity w has an average of 0 drift, then an increase 
in lO activity will lead to acquisition and a decrease, to extinction. 
Once this constraint is satisfied, the acquisition and extinction 
constraints modulate the rate of either learning process. 



(3) 



Note that whenever a depression event occurs, it outweighs 
the default potentiation events triggered by the plasticity trace 11, 
resulting in a net depression. 

In the presence of CRs, D must be corrected to account for the 
lO events (spontaneous or US-evoked) suppressed by the NOI. 
Note that since the timing of inhibition depends on the trigger- 
ing of the CR and the eligibility window is anchored to the CS, 
rapidly elicited CRs are more effective in gating plasticity than late 
CRs. In other words, the effectiveness of the gating decreases as 
the CRs become more delayed. We can heuristically approximate 
the reduction in lO events reaching the coincidence detection by 
multiplying the number of lO detections by an estimated mean 
proportion of lO events not suppressed by the NOI (a). 



D= n^((jl) = (tH^I, 



(4) 



where a = 1 — a, with a being the proportion of lO events 
suppressed by inhibition. 

As the equation illustrates, this result can be computed simply 
by multiplying the result of Equation. (3) by the factor a. 

Optimization of the plasticity parameters. At this point, having 
estimated the number of plasticity events produced by two sets of 
event detections in PN and lO, we obtain the optimal plasticity 
parameters {8p and 8^) by solving with weighted least squares of 
the following system: 



■Pi Di" 



0 



(5) 



Pi and Di are the mean plasticity events per trial, potentiation and 
depression, respectively, and the sub-indexes indicate the experi- 
mental condition (see Table 1). They are obtained by dividing D 
and P by the number of trials contained in the training set. is 
an estimate of the change in w necessary for acquisition and 
sets the desired number of trials for acquisition. These two values 
set a target mean change of w per trial. For instance, if the initial 
value of w is 0.5 and we estimate that well-timed CRs occur when w 
reaches a value of 0.3, then we set A ^ to 0.2. A ^ and are the same 
values applied to extinction. As we declared in the assumptions of 



Adaptive calibration of the model 

In the previous section, we have introduced a calibration method 
that assumes stationary biosignals during the experiment. Cru- 
cially, this is a strong assumption that will hardly ever be met under 
in vivo conditions. In our case, for instance, the rate of lO activity 
in the bio-hybrid experiment markedly fluctuated producing non- 
associative modifications in the synaptic efficacy w. For this reason, 
here we introduce an adaptive version of the calibration method 
that supports non- stationary responses in lO activity. Since the 
recalibration has to occur without resorting to additional train- 
ing data, we keep the same acquisition and extinction constraints 
used for the initial calibration, and we only update the stability 
constraint, introducing in this constraint the current estimation 
of the rate of spontaneous lO activity. 

The recalibration is periodically performed, with a fixed time 
interval. In the experiment, we used 150 s that corresponds roughly 
to 10 trials. Such recalibration requires an estimate of the ongo- 
ing level of spontaneous activity in the lO (lOfar), where the 
sub -index far stands for the false alarm rate. To compute this 
estimate we count the number of lO detections between recal- 
ibrations. Note that, since the system is blind to whether the 
detections are spontaneous or evoked, i.e., it has no knowledge 
whether stimuli are presented or not, for the estimation of lOfar, all 
detections are considered spontaneous. During acquisition, given 
that some of the lO detections will be US-evoked, this results in 
an over- estimation of the true lOfari the estimate is more accu- 
rate for a higher proportion of spontaneous detections to evoked 
ones, a result that can be easily achieved using large inter-trial 
intervals (ITI). 

Since the lO rate only affects the number of depression events, 
only 1)3 in Equation (5) (accounting for the number of depression 
events during spontaneous activity) has to be updated, whereas 
the other term has no dependence on lO activity. By updating reg- 
ularly such parameter, we provide an algorithm simple enough to 
be implemented in a low power VLSI solution. Indeed, algebraic 
manipulation shows that we can compute the solution to the opti- 
mization for each of the two plasticity parameters as a ratio of two 
polynomials with a maximum degree of 1 (for the derivation see 
Appendix). For instance, in the case of 5^ we have: 

^ _ Ot2lO]^^ + ailOfar + OtQ 
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where the coefficients of the polynomials are determined only by 
the training data. For the detailed derivation of this formula, see 
Appendix 2. 

WORK-FLOW OF THE BIO-HYBRID EXPERIMENT 

The methods introduced so far were common to the simulation 
and in vivo experiments. In what follows we introduce the method- 
ology specifically developed for the bio-hybrid preparation. 

Animal preparation and recordings 

The experimental procedure has been previously described in 
Prueckl et al. (2011a). In summary, the bio-hybrid experiment 
was performed on one naive male Sprague Dawley rat. The rat was 
housed in a cage with ad lib food and water under a reversed 12 h 
dark/light cycle. On the experiment day, the rat was anesthetized 
with i.p. injection of ketamine hydrochloride (lOOmg/kg) and 
xylazine (5 mg/kg) mixture. Body temperature was maintained by 
a heating pad set to 37°C and supplementary doses of Xylazine 
and Ketamine were administered as required. The rat's head was 
positioned in a stereotaxic head holder (Kopf instruments). The 
skull was exposed, connective tissue was removed, and skull was 
dried using a H2O2 solution (30%). Openings were drilled in the 
skull over the left lO and PN to allow the insertion of recording 
electrodes. The electrodes - a 5 Mohm tungsten electrode (A-M 
Systems, WA, USA) for the lO, and a 10-channel titanium-nitride 
micro electrode array (Faculty of Engineering, Tel Aviv Univer- 
sity) for the PN - were lowered vertically into the brainstem until 
a reliable response to tone (PN) and air-puff (lO) was observed. 
Signals from both recording sites were band-pass filtered (300- 
3000 kHz) to work in the multi unit activity range. A stimulating 
electrode was also placed in the facial nucleus and tested to induce 
reliable eye-blinks when 200 mA 0.1 ms constant- current pulses 
with a frequency of 80 Hz for 150 ms were delivered. 

Once the experiment was completed, a direct current was 
passed through the electrodes (0.5 mA for the lO and 1 mA for 
the PN), the rat was sacrificed with an overdose of pentobarbital 
and perfused transcardially (0.9% saline followed by 10% for- 
malin solution), the brain was removed, and sliced into 50 /xm 
coronal sections, stained with thionine blue, and electrode loca- 
tions were confirmed under a light microscope. All procedures 
were approved by the Tel Aviv University Animal Care and Use 
Committee (P-05-004). 

Experimental protocol 

For the classical conditioning preparation, we used as the CS a 
white-noise stimulus at 67-70 dB with a duration of 450 ms and 
a 150 ms on air-puff as the US. The presence of CRs was veri- 
fied by recording the electromyography from the orbicularis oculi. 
The inter- stimulus interval was set to 300 ms and the ITI was 
randomized between 10 and 15 s. 

After validating the responsiveness of the multi unit activity 
signal to the air-puff and the tone by visual inspection of both 
peri- stimulus time histograms of multi unit activity events, we 
recorded a training data- set that was comprised of 30 trials with 
paired CS-US presentation, followed by 2min of spontaneous 
activity. This data- set was then used to calibrate the synthetic 
cerebellum (see below). After calibration we proceeded with the 



classical conditioning paradigm, presenting the animal with paired 
CS-US stimulations until stable CRs are observed. Subsequently, 
CS- alone presentations were delivered until extinction of the CRs 
has been achieved. 

Signal processing 

The goal of the signal processing stage is to detect in the multi unit 
activity signal the onset of the responses to the CS and US, i.e., to 
the tone and the air-puff stimuli, respectively. Given the intended 
implementation in a low power VLSI solution, we limit ourselves 
to low complexity algorithms. Briefly, we detect sustained increases 
in the variability of the multi unit activity signal occurring after 
each stimulus presentation. This is achieved with the following 
steps: first we subtract from the signal a running estimation of 
the mean and rectify the resulting signal. Secondly, the signal is 
smoothed to obtain a short-term temporal average that serves as 
a measure monotonically related to the variability increase. Lastly, 
event detection occurs every time the resulting variability signal 
surpasses a certain detection threshold. 

We a priori defined the windows of possible true detections 
(TDs) for each channel (10-150 ms after the trigger in the PN and 
5-205 ms after the trigger in the lO). Likewise, the performance of 
the signal detection can be summarized with the true detections 
ratio (TDR) and the false alarm rate (FAR), where the TDR indi- 
cates the proportion of stimuli raising at least one detection within 
the TD window, and the FAR the frequency of events detections 
during the periods of no stimulation, i.e., outside the TD window. 
Note that since in the lO we found an FAR between 0.5 and 2 Hz, 
we can compare the lOfar with the spontaneous levels of activity 
in the lO (Jirenhed et al, 2007). 

Estimation of the event detections 

With the calibration data-set, we estimate for both channels the 
detection performance during early acquisition trials (before any 
CR is triggered). To estimate the number of detections during CS- 
alone trials, we combine the PN data from the paired stimulation 
trials with lO data from the spontaneous activity period. This is 
done in order not to excessively extend the calibration phase. 

Optimization of the signal detection regimes 

To signal an event detection, we have to first set a detection thresh- 
old. Such selection poses a multi- objective optimization problem 
since we want to simultaneously maximize the TDs and minimize 
the false alarms, and we do not know a priori, which is the best 
trade-off of both measures that maximizes the chance of success 
in our experiment. To overcome this problem, we iterate the opti- 
mization process over a set of threshold configurations for both 
input channels, and then select a posteriori the one yielding the 
minimum error in the optimization of the plasticity parameters 
[Equation (5)]. Likewise, the optimization process, and not us, 
selects the optimum trade-off between TDs and false alarms. Note 
that the simplicity of the calibration method previously intro- 
duced allows us to iterate over a great number of possible threshold 
configurations in very little time. 

USE OF THE MODEL WITH SIMULATED DATA 

Before deploying the computational model in the bio-hybrid 
setup, we tested the performance of the model with artificially 
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generated data. To this end, we generate a set of detections for 
each channel according to a certain pair of TDR and FAR sta- 
tistics. From these two statistics, we produce the binary vectors 
of detections P and I, and for this we convert the TDR and FAR 
to event probabilities per time step. We obtain the probability of 
detection in the absence of stimulation by multiplying the FAR 
by the model time step, which is 2 ms. Regarding TDs, to convert 
the TDR to a probability we have to consider the size of the TD 
window. Operationally, since we interpret the TDR as the prob- 
ability of getting at least one event within the detection window, 
we have to find the event detection probability yielding no events 
during the TD window with a probability of 1 — TDR. This event 
detection probability is equal to 1 — — TDR, where n is the 
size of the TD window in time steps. 

RESULTS 

SIMULATION EXPERIMENTS 

Performance of the model with spontaneous activity in the 10 

As a first step, we test whether our model supports the acquisition 
and extinction of CRs when the lO displays spontaneous activ- 
ity (see parameters in Table 3). The outcome of a representative 
simulation shows that indeed the model adapts well to the case of 
baseline lO activity (Figure 4A). It acquires well-timed CRs in CS- 
US paired trials and extinguishes them in CS-alone unpaired trials 
(Figures 4A and 5A), and, importantly, the parameter w reaches a 
stable plateau after complete extinction (Figure 5B). We stress that 
the stabilization occurring at the end of extinction even in the pres- 
ence of CS-alone stimulations, stems from the stability condition 
in Equation (5). If we remove this constraint, the overt behavioral 
results are similar (Figures 4B and 5A) but the underlying mem- 
ory dynamics differ (Figure 5B). Indeed, behaviorally both models 
only differ in the extinction phase, which is slower for the model 
with stability. However, in regard to the model's state, without sta- 
bility, the synaptic efficacy w grows also after extinction of the CRs 
has been accomplished. Note that, in consequence this can make 
reacquisition harder than acquisition if the extinction training is 



Table 3 | Parameters of the simulation. 



Parameter 


Value 




CEREBELLAR MODEU^^^^^^^^^^^^^^^^^^^H 


CS trace 




^0 


1 


T^ 


0.5 


Ar 


350 ms 


A-noi 


100 ms 


^CR 


0.2 


Wo 


0.5 


SIGNAL DETECTION ^^^^^^^^^^^^^^^^^^^^M 




0.95 


PNfar 


OHz 


\Otd 


0.75 


lOfar 


1.0Hz 


FITTING CONSTRAINTS ^^^^^^^^^^^^^^^^H 


Aa, Ae 


0.2 


Ta,Te 


40 



maintained, which goes against the experimental evidence (Kehoe 
and Macrae, 2002). In conclusion, the computational model of 
the cerebellum is also functional when the lO has baseline activity, 
requiring only a proper calibration of the plasticity parameters. 

Effect of the latencies of the cerebellar model 

We previously discussed the relevance of the latencies in the design 
of the controller. Here, we will illustrate with two examples the 
functional implications of the two latencies implemented in the 
model, namely, the latency of the NOI and the latency of the plas- 
ticity trace. We recall that in our model, both latencies are set to 
the same value, namely, Anoi seconds. 

The effects of the NOI latency on the timing of the CRs have 
already been discussed in the literature at the theoretical level 
(Hesslow and Ivarsson, 1994; Hofstotter et al., 2002), and demon- 
strated in experimental set-ups (Herreros Alonso and Verschure, 
2013). Here, and because of the noisy input conditions, we see that 
if we do not apply any delay to the NOI, the triggered CR eventu- 
ally anticipates the US, but by too short a latency too be considered 
effective (Figure 6A). Therefore, even though the model triggers 
CRs, they are maladaptively timed. Indeed, the synaptic efficacy w 
fails to reach a level sufficiently low to initiate well-timed CRs, as 




block block 

FIGURE 5 I Behavior of the model with simulated data. (A) Behavioral 
performance. Percentage of CRs per block of trials of the nnodel fitted with 
stability constraint (solid line) and without (dashed line). The vertical dotted 
line separates acquisition and extinction training. (B) Trajectory of w in the 
nnodel fitted with stability constraint (dashed line) and without (solid line). 
The horizontal green dotted line nnarks the level above which the model 
does not trigger any CRs. Blocks of 10 trials. 
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FIGURE 4 I Raster plots of the inputs and outputs of the model with 
and without stability constraint. (A) Model with stability constraint. PN 
detections (green), 10 detections (black), and CR triggers (blue, well-timed 
thick, and late thin). CS (yellow area) and US (pink area) periods. The 
horizontal dashed red line separates acquisition and extinction phases. 
Vertical blue line marks the limit for well-timed CRs. (B) Model without 
stability constraint. Data plotted as in (A). 
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FIGURE 6 I Results with and without delayed NOI. (A) Raster plot with 
the output of the model with the delay of the NOI set to Os. (B) Trajectory 
of w in the model with a delay of 100 ms in the NOI (solid line) and with no 
delay (dashed line). The horizontal green dotted line marks the level above 
which the model does not trigger any CRs. Blocks of 10 trials. 
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FIGURE 8 I Effect of the delayed plasticity trace on the behavior. 

(A) Model with plasticity trace starting Anoi seconds after each PN 
detection. (B) Model with plasticity trace starting right after each PN 
detection. Data plotted as in Figure 4. 
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FIGURE 7 I Results with non-delayed NOI inhibition in different 
conditions. Raster plots with the output of the model with the delay of the 
NOI set to 0 s. (B) the model constraint to acquire CRs twice as fast or with 
(A). A ratio of TDs in the 10 lowered to 50%. (C) Traces of the synaptic 
efficacy w for the simulation in Figure 6 (black) compared to the 
simulations in (A) (dotted red) and (B) (dotted blue). 



it does when the latency of the NOI is properly set (Figure 6B). 
Note, however, that the jitter of the trace of the synaptic efficacy w 
occasionally brings the CR triggers close to the criterion of correct 
timing. Given that, if such a jitter will be increased it would be 
possible for occasional CRs to anticipate the US sufficiently to be 
characterized as well-timed. This occurs if, for instance, the signal 
to noise ratio of the lO signal decreases (Figure 7B, with TDs in the 
lO lowered from 70 to 50%) or if we force the learning to be faster 
(Figure 7A). This by no means indicates that the model works 
better if the signal conditions are worse, it only indicates that as 
the dynamics of the model become more noisy (Figure 7C), some 
well-timed CRs may incidentally be triggered, even if the delay of 
the NOI is not correctly set. 

Not delaying the plasticity trace leads to a different kind of non- 
adaptive behavior. In this case, if we set an ISI below the minimum 
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FIGURE 9 I Event detection performance for the recording sites. ROC 

curves for the 10 (A) and the PN (B) event detections. 



ISI described in Equation (1), the computational model without 
delayed plasticity acquires CRs that can only be late CRs by defin- 
ition (Figure 8B). In contrast, setting the appropriate delay to the 
plasticity trace avoids building any association between CS and US 
that are too close in time (Figure 8A). 

BIO-HYBRID EXPERIMENT 
Evaluation of the training data 

We started the bio -hybrid experiment recording a training data- set 
composed of 30 trials of paired CS-US stimulation, with an ISI of 
300 ms and an ITI of 10 s. After applying the signal processing algo- 
rithms (see Materials and Methods), we built the receiver operating 
characteristic (ROC) curve for each of the channels (Figure 9). 
The PN channel TDs reached 100% with a false alarm rate close to 
0.1 Hz while the lO displayed TDs near 50% for the range of opti- 
mal FARs (~1 Hz). Therefore the PN channel was reliable while 
the lO channel was poor from the detection standpoint. 

Optimization oftlie plasticity parameters and signal detection 
regimes 

The following phase entailed tuning the plasticity parameters and 
selecting the optimal signal detection thresholds. The optimiza- 
tion process selected detection thresholds yielding a percentage of 
TDs of 48.6% and a FAR of 1.14 Hz for the lO channel, and a 
9 1 .4% of TDs with a FAR of 0. 1 1 Hz for the PN. 

The model calibration sets the potentiation and depression 
steps {8p and 5^) to 0.0161 and 3.36e— 5, respectively. 
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FIGURE 10 I Performance of the experiment predicted by the training 
data. (A) Trajectory of the memory parameter after 2500 simulations 
plotted in blocks of 10 trials. The simulated experiment contained 120 trials 
of acquisition and 180 trials of extinction. Distribution of the block-by-block 
values of w (grayscale) with mean (blue) and output of a sample simulation 
(red) are shown. We indicate the levels of the weight that result in late 
(upper green line) and well-timed CRs (lower green line). The transition from 
acquisition to extinction training is marked by a vertical line. (B) Predicted 
behavioral performance after 2500 simulations. Percentage of well-timed 
CRs. Distribution of the block-by-block performance (grayscale) with mean 
(blue) and result (red) of a sample simulation [same as in (A)]. 



The offline simulation parameterized with the previous values 
is shown in Figure 10. Firstly, on average acquisition occurs in 
40 trials with an asymptotic performance of 40% well-timed CRs. 
Secondly, there is low chance of obtaining total extinction after 120 
trials of CS-alone stimulation. Thus, the simulations predict that 
a low detection quality in the lO channel may hinder extinction. 

Evaluation of the bio-hybrid experiment 

After the preliminary assessment of the quality of the biosignals, 
we proceeded with the online classical conditioning experiment 
(Figure 11). The experiment lasted 1 h 20 min and comprised 190 
CS-US stimulation trials (acquisition) followed by 180 CS-alone 
trials (extinction), with randomized ITIs between 10 and 15 s. In 
Figure 11, we display events detected and stimulations triggered 
by the neuroprosthetic system. For the whole experiment, in the 
PN there was a TDR in of 75.5% and a FAR of 0. 1 Hz that include a 
high number of late CS-detections (Figure 11). Notably, the num- 
ber of PN detections during baseline was very low (only five false 
alarms in 80 min). In the lO, we obtained a TDR of 38% and a 
FAR of 1.2 Hz. 

Detections in both channels were delayed by tens of millisec- 
onds with respect to the stimulus trigger. The mean latency of 
the TDs in the PN {cocs) was of 96.2 ms after the CS-trigger 
(Figure 12A) whereas the mean latency in the lO channel (cous) 
was of 68.5 ms (Figure 12B). 

The experiment was successful in terms of behavior: well-timed 
CRs were triggered with regularity toward the end of the acqui- 
sition phase, and no CR was triggered during the last 90 trials of 
the extinction training (Figures 11 and 12C). The first response 
appeared at trial 29, but the first well-timed CR came only at 
trial 118. Notice that toward the end of acquisition, the series of 
well-timed CRs appeared regular. After the onset of the extinc- 
tion trials (trial 191), well-timed CRs were rapidly extinguished. 
A block-by-block analysis reveals that the performance fluctuated 
during acquisition (Figure 12C) and that extinction of well-timed 
CRs was very rapid, in total there are only four well-timed CRs 
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FIGURE 11 I Event detections and triggers during the online 
experiment. Raster plot with the PN detections (blue dots; well-timed PNs 
are thicker), 10 detections (black), and CR triggers (blue dots; well-timed 
triggers are thicker). The CS (yellow area) and US (pink area) periods are 
indicated. Blue line separates well-timed from late CRs. The horizontal 
dashed red line separates acquisition and extinction phases. 



during extinction, the last one appearing at trial 220, i.e., after 30 
trials of extinction. However, the extinction of late CRs was more 
gradual, encompassing blocks 21-29, i.e., total extinction required 
100 trials. No CR was triggered by the system in the last 60 trials of 
the experiment. Regarding the timing, well-timed CRs occurred 
on average 50 ms ahead of the US-trigger (Figure 12D). 

The evolution of the synaptic efficacy w is displayed in 
Figure 13. We estimate that given our setup CRs follow a PN 
event whenever the value of w goes below 0.4. However, for such 
CRs to be anticipatory, w should settle at 0.28 or below. During the 
experiment, w decreased steadily during the first 60 trials, down to 
a value of 0.29. Afterward, the decrease decelerated. The dynamics 
of w suggest that learning has reached an asymptotic-level by the 
end of the acquisition stage (Figure 13). The mean value of w dur- 
ing well-timed CRs was 0.25, corresponding to CRs triggered on 
average 140 ms after the PN detection. Thus, for an ISI of 300 ms 
the model acquired an internal timing (^cr) of 140 ms. 

Thus, in conclusion, at a first level of analysis, the results of 
the bio-hybrid experiment were correct both from the behavioral 
point of view and also regarding the dynamics of the underlying 
memory parameter stored in the synthetic cerebellum. 

Instability of the activity during the recording 

Having said that, there were two major caveats in the experiment. 
First, due to an artifact introduced by the electrical stimulation of 
the CR, the signal of the lO was masked for 2 s after each CR. This 
masking was performed at the signal acquisition stage (Prueckl 
et al, 2011b). For this reason, no events reached the computa- 
tional model of the cerebellum for 2 s after each CR trigger. Under 
such circumstances, the model's NOI became superfluous, because 
for all its extent there was no lO detection to be inhibited. Or, in 
other words, the mask at the signal acquisition stage acted as a NOI 
with 0 latency and longer duration. We have already argued that 



Frontiers in Bioengineering and Biotechnology | Bionics and Bionnimetics 



May 2014 | Volume 2 | Article 14 | 10 



Herreros et al. 



A cerebellar neuroprosthetic system 




0.1 0.2 0.3 0.4 0.5 
time after CS trigger (s) 




10 15^ 20. 25 30 35 
#bfock 



FIGURE 12 I Quantitative results. (A) Events detected in the PN 
Histogrann of PN detections relative to the CS-trigger:TDs (black bars) and 
FAs (gray bars); in this case all FAs are late CS-detections. CS period 
(yellow area) and US period (pink area). (B) Events detected in the 10. 
Detections in the 10 sorted relative to the US-trigger. Data plotted as in (A). 
(C) Behavioral performance of the bio-hybrid. Percentage of well-timed CRs 
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during acquisition and extinction (solid line) are shown. CRs that were not 
triggered at least 20 ms ahead of the US-trigger appear as late CRs (dashed 
line). Each block contains 10 trials. (D)Timing of CRs. Histogram of the 
CRs: well-timed (black bars) and late ones (gray bars). CS period (yellow 
area) and US period (pink area) are indicated. The information is extracted 
from trials 118 to 190. 
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FIGURE 13 I Weight trajectory during the experiment. The dashed vertical line separates the acquisition and extinction phases. 



the latency of the NOI is necessary for consistently achieving a cor- 
rect timing of the CRs (Figure 6). Thus, in summary, on the one 
hand, it is reasonable to assume that the well-timed CRs were in 
part a consequence of the noisy conditions of the input setup (e.g., 
a lOfd of 38%), and on the other, it is also reasonable to expect that 
in the absence of the stimulation artifact, the synthetic cerebellum 
would have achieved a higher proportion of well-timed CRs. 



Second, our calibration of the cerebellar model assumes that 
the level of spontaneous activity in the lO remains constant. 
If the lO spontaneous activity significantly deviates from the 
one estimated during calibration, then w will drift, eventu- 
ally leading to either acquisition or extinction. Since during 
the conditioning experiment we observed that the spontaneous 
activity fluctuated (Figure 14), there is the possibility that the 
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FIGURE 14 I Fluctuations in the spontaneous 10 rate. Mean 10 rate in 
each trial of the experinnent. The horizontal dotted line marks the 1.14 Hz 
level of activity. The vertical dashed line marks the transition from 
acquisition to extinction trials. 



behavior observed did not result from associative learning but 
from changes in w due to oscillations in the lO spontaneous 
activity. In particular, given that the lO spontaneous activity 
increased during acquisition and decreased during extinction, 
such fluctuations might have caused or favored the behavioral 
result. 

To perform an a posteriori control for this, we checked whether 
the observed oscillations in spontaneous activity may lead to 
acquisition by themselves even in the presence of temporally unre- 
lated CSs and USs. We tested this by simulating unpaired CS-US 
presentations. For this, we generate experiments with shuffled lO 
detections within each trial. After performing 20,000 simulations, 
we observed that the increased spontaneous activity of the lO 
causes a decrement in w during the acquisition phase for unpaired 
stimulation (Figure 15A). Considering the behavior (Figure 15B) 
in the average simulation, the decrease of w yielded to the trigger- 
ing of a small number of CRs. These results both at the level of the 
memory parameter and the behavior were clearly below the perfor- 
mance observed on the bio-hybrid experiment but demonstrated 
nonetheless that under experimental conditions with big vari- 
ability in the recorded signals, the model might acquire spurious 
associations. 

CALIBRATION METHOD ADAPTED TO FLUCTUATING 10 ACTIVITY 

Lastly, we tackle the problem of the instability of the lO activity 
evidenced in the previous section. For this we apply the adap- 
tive calibration method (see Materials and Methods). We test 
this method with data from the bio-hybrid preparation, aiming 
to show that with the adaptive calibration, the cerebellar model 
becomes robust against slow fluctuations in the baseline lO activ- 
ity. As a caveat, notice however, that in the bio-hybrid experiment, 
by definition of a closed-loop setup, the data recorded during the 
session depended on the output of the model. In this case, the 
data recorded were affected by the electrical stimulation in the FN 
driving the CR. Thus, to cancel out this effect we replaced the 3 s 
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FIGURE 15 I Observed performance vs. performance during 
simulated unpaired acquisition. (A) Acquisition during paired CS-US 
training versus simulated unpaired CS-US. Trajectory of the weight 
during the acquisition phase of the experinnent (black line) plotted 
against results of 20,000 simulations of unpaired training. Distribution 
of the simulation results (grayscale), median (blue dotted line), and the 
0.05 bottom of the distribution (red line) is shown. Blocks of 10 trials. 
(B) Behavioral performance during acquisition against performance in 
the simulations. Percentage of CRs during acquisition in the 
experiment (black line) plotted against the percentage obtained in the 
simulations. Distribution of the simulation performances (grayscale), 
median (blue dotted), and the upper 0.1 of the distribution (red line). 



of the lO signal occurring after each CR, by the signal extracted 
from random trials with no CR. 

The results (Figure 16) now clearly separate the performance 
between the unpaired- and paired- stimulation experiments. Most 
importantly, there is no acquisition of CRs during unpaired stim- 
ulation. In this case, the fluctuations of the baseline lO rate do 
not push w further than ±0. 1 from the starting value, both during 
acquisition and extinction. Regarding the performance, in average 
there are no well-timed CRs with unpaired CS-US training. On the 
contrary, in the simulated acquisition and extinction experiment, 
the overall behavior of acquisition followed by extinction is pre- 
served. In this experiment, the CR performance decreases relative 
to the result with the bio-hybrid, especially by the end of acquisi- 
tion, when the recorded lO baseline rate was higher. The reason is 
that now a high rate of spontaneous detections in the lO dimin- 
ishes the relative saliency of the US -evoked events, because such a 
high rate harms rather than helps acquisition. Thus, this simula- 
tion confirms that if the same conditions of signal instability of the 
bio-hybrid experiment are repeated, with the adaptive calibration 
method we can ensure that any CRs observed will specifically be 
due to the CS-US association. 

DISCUSSION 

In this paper, we have addressed the challenge of defining, inter- 
facing, and validating a neuroprosthetic system for the cerebel- 
lum. More in detail, we have: defined a biologically grounded 
computational model of the circuit targeted for substitution; 
defined its input and output structures and decoded input events; 
implemented a prosthetic cerebellum; interfaced it to a rat brain. 
Our results show that our bio-hybrid preparation shows behav- 
iorally and physiologically valid forms of acquisition and extinc- 
tion of the conditioned eye-blink response. Our neuroprosthetic 
system learned to associate a tone with an air-puff, and as result 
to trigger an anticipatory closure of the eyelid. Since motor CRs 
are not acquired or expressed under such anesthesia regime, the 
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FIGURE 16 I Performance with the adaptive calibration method. 

(A) Trajectory of the memory parameter w plotted in blocks of 10 trials. 
Distribution of the performance of the simulations of unpaired CS-US 
stimulation (grayscale), with mean (dotted blue) and lower 10% (dotted red), 
for a total of 2500 simulations of 36 blocks. Trajectory of the simulated 
classical conditioning experiment (solid black), with 18 blocks of acquisition 
and 18 blocks of extinction. The transition from acquisition to extinction 
training is marked by a vertical line. (B) Behavioral performance of the same 
simulations. Percentage of well-timed CRs per block is shown. Results 
plotted with the same convention as in (A). 



observed eye-blinks are produced by the synthetic system. The 
fact that acquired CRs can be aboHshed by extinction training also 
indicates that the CRs result from a learning process induced by 
the contingent association of the CS and the US, thus reproducing 
the hallmark result of Pavlovian classical conditioning. 

Here, we have presented a step towards the enhancement and/or 
recovery of the capabilities of central nervous system through 
neuroprosthetic solutions. Recently, another closed-loop solution 
targeting a different structure of the brain, the hippocampus 
(Berger et al., 20 11) has been presented. This system, however, fol- 
lows a different approach where firstly the subject had to acquire 
a specific stimulus response association that was recorded by the 
neuroprosthetic system, and subsequently the recorded state was 
used to recover this association after lesion to the hippocam- 
pus. Compared to such model, instead of aiming at restoring 
an acquired memory, our neuroprosthetic aims to fully replace 
its target structure and to realize the capability to form new 
memories. 

In a parallel effort, the computational model of the cerebellum 
here presented, together with the signal detection algorithms and 
the signal acquisition components, have been implemented in a 
low power VLSI (Bamford et al, 2012). Hence, with the results 
presented here we provide the feasibility requirements of a neu- 
roprosthetic system, encompassing issues related to stability and 
non-stationarities. 

An earlier version of the computational model presented here 
was implemented in an aVLSI platform and interfaced with a 
robot that was conditioned to a visual stimulus predicting a col- 
lision (Verschure and Mintz, 2001; Hofstotter et al, 2002, 2004). 
Thus, after showing that our approach allowed miniaturization 
and autonomous performance, we have now demonstrated that 
the model can be applied to the processing of inputs coming from 
a living brain that are specific to the computation performed, i.e., 
CS, US, and trigger its specific output: the CR. 

In our experiment, the lO channel provided the only teaching 
signal to the system. This channel displayed a spontaneous level 



of activity in the 0.5-2 Hz range, i.e., the level of activity expected 
in a single lO cell. However, in healthy animals, acquisition of an 
eye-blink CR is controlled by a cerebellar micro-complex, encom- 
passing not one, but a number of lO cells. This imposes different 
constraints on the learning system because the lO- derived error 
signal for our neuroprosthetic is in all likelihood much impover- 
ished as compared to its biological counterpart. Hence, we expect 
that the key feature to strengthen in our approach is the quality 
and precision of the data acquisition of the biological preparation. 
For this, we are planning further experiments in a chronic implant 
together with higher bandwidth physiology. 

We reported two major caveats in the experimental preparation: 
the instability of the lO spontaneous rate, and presence of a stim- 
ulation artifact that precluded reliable read- out of the lO signal. 
Regarding the first problem, in the bio-hybrid experiment we com- 
puted the plasticity parameters assuming that the spontaneous lO 
rate inferred from the calibration data remained stable through- 
out the experiment. However, we observed that fluctuations in 
the spontaneous lO firing rate induced a drift in the w synap- 
tic weight. Comparing this performance with simulated unpaired 
CS-US experiments, we saw that the performance with unpaired 
stimulation tended to be below the one observed in our exper- 
iment, but we also saw that the system triggered non-associative 
CRs. Next, we showed in simulations that with an adaptive calibra- 
tion method it is possible to compensate for the fluctuations in the 
lO activity and thus to prevent the acquisition of non-associative 
CRs (see Figure 16). We can conclude from the bio-hybrid exper- 
iment that our silicon cerebellum neuroprosthetic can be tuned 
to deal with marked fluctuations of its input brain signals. This 
also demonstrates the robustness of the learning principles imple- 
mented in the cerebellum and in particular the negative feedback 
loop implemented by the DN-IO system. 

Additionally, the fluctuations of the lO signal reduced the accu- 
racy of the performance predicted during the calibration step. This 
justifies the future development of calibration methods accounting 
for spontaneous drifts in the recorded neural activity. 

Regarding the problem of the stimulation artifact, addressing it 
falls outside the scope of the current analysis. We emphasize that 
this problem is an issue of engineering of the stimulation system 
that the biological system does not encounter. We are investigating 
two solutions. First, given the very short duration of the stimula- 
tion pulses (Prueckl et al., 2011b), it is possible to apply a more 
precise masking to the lO signal, timed to these pulses, that would 
minimize the signal loss. A second possibility is to avoid electrical 
stimulation altogether using optogenetics. These aspects need to 
be taken into account in a next iteration of the neural prosthesis 
development. 

In conclusion, from a bio-engineering perspective we demon- 
strate that our approach supports outsourcing the acquisition and 
extinction of an adaptive reflex in an acute preparation to a linked 
neuroprosthetic system. Given the modularity of the cerebellum, 
and the common assumption that the cerebellar algorithm per- 
forms similar computations throughout its different microcircuits 
(Marr, 1969; Albus, 1971; Dean et al, 2010), this work could be 
applied to support other adaptive reflexes as well, as long as their 
afferent and efferent circuitry could be identified. However, the 
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multifunctionality of a microzone and/or its interactions with 
other cerebellar microcircuits are not addressed with our approach 
and would require interconnecting and synchronizing multiple 
prosthetic microcircuits working in parallel. Additional work is 
required to reproduce this result with the aVLSI synthetic system, 
testing this approach with a chronic implant, where one could 
assess the stability of the acquired memory across days and study 
long-term bio-compatibility issues. 
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APPENDIX 

Table A1 | List of abbreviations. 



Process 1 : Trace generation. 



Abbreviation IVIeaning 



10 

PN 

DN 

FN 

NOI 

CS 

CR 

US 

UR 

ISI 

ITI 

^noi 

TDR 

FAR 

PNtd 

PNfar 

\Otd 

lOfar 

w 

^cs 

'^US 
-^CR 
Pi 
Pi 

Di 
Di 

Aa 
Ae 
7a 
Te 



Inferior olive 
Pontine nuclei 
Deep nuclei 
Facial nucleus 
Nucleo-olivary inhibition 
Conditioning stinnulus 
Conditioned response 
Unconditioned stinnulus 
Unconditioned response 
Inter-stinnulus interval 
Inter-trial interval 
Latency of the NOI 
Potentiation step size 
Depression step size 
Proportion of true detections 
False alarnn rate 

Proportion of true detections in the PN 
Rate of false alarnns in the PN 
Proportion of true detections in the 10 
Rate of false alarnns in the 10 
Synaptic efficacy 

Latency between CS-trigger and CS event detection 
Latency between US-trigger and US-event detection 
Latency between CR-trigger and its physical execution 
Nunnber of potentiation events in the condition / 
Mean nunnber of potentiation events per trial in the 
condition / 

Nunnber of depression events in the condition / 
Mean nunnber of depression events per trial in the 
condition / 

Estinnated annount of change in w required for acquisition 
Estinnated annount of change in w required for extinction 
Desired nunnber of trials required for acquisition 
Desired nunnber of trials required for extinction 



1. MATHEMATICAL DEFINITION OF THE CEREBELLAR MODEL 

In what follows we provide a compact definition of the cerebellar 
computational model, describing each of the processes separately. 
The only additional notation introduced here for convenience are 
the sample time s, that it is used to access the state of a variable in 
the previous step of the computation, and the Heaviside function, 
H{x)y defined as: 



H(x) 



1 if X > 1 
0 elsewhere ' 



The signals and the processes correspond to the ones pre- 
sented in Figure 3. Note that except T(t), S(t), and that 
are real-valued, the rest of the signals are binary. 



T(f) = ro'"<'MH(T(f ■ 



5) - Ti) T(t - 5) 



To 



Ar 



(l-P(t)) 



Process 2: Scaling. 

s(o = w(tmt). 

Process 3: Thresholding. 

F(f) = HiOcR - SCO) H(S(t - 5) - 9cr). 
Process 4: Inhibitory pulse. 

N(t) = N(t - s) + F(t) - F(t - A,). 
Process 5: Delay of inhibitory pulse. 

NA(0 = N(f-A„oi). 

Process 6: Gating. 

C(f) = I(f)(l-NA(f)). 
Process 7: Delay of plasticity trace. 

n(0 = Hint - Anoi)). 

Process 8: Coincidence detection. 

w(t) = w(t - s) + n(t)(8p - C(t)8d). 

2. DERIVATION OF THE UPDATE FUNCTION OF THE ADAPTIVE 
CALIBRATION MODEL 

We assume that during spontaneous activity, detections in the lO 
and the PN are independent. Consequently, the probability of a 
simultaneous detection in both channels is defined as PNfar lOfar- 



1)3 ^PNfarArlOfarTs, 



(Al) 



~Pi Di' 










Pi Di 








0 



where is the duration of the original recording used for the 
calibration and is the duration of the plasticity trace. 
The linear system to optimize is given by: 



(A2) 

The expression in matrix notation becomes: 

Ax = b. (A3) 
Adding the cost matrices for the weighted least squares, we get: 
CAx = Cb, 
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where 



The last step to solve the system is to multiply by b, after what 



ci 0 0 
0 C2 0 
0 0 C3 



(A4) 



With this, the least squares solution for x is given by 

x = (A^CA)"^A^Cb. 

First, A^CA expressed in terms of plasticity events and costs is 
the following matrix 

r CiP2 + C2P| + C3PI qPiDi + C2P2D2 + C3P3D3'] 
[ciPiDi + C2P2D2 + C3P3D3 ciDj + C2D| + C3DI J • 

(A5) 

Grouping and renaming all the terms that do not depend on 
D3, we can express this matrix as: 



r Xl _ X2-\-f2D3 
|X2 + V^2^^3 X3 + V^3^^i 



D3I 



(A6) 



The expressions for the x ^rid terms are given at the end of 
this Appendix. 

The determinant of this matrix, D, expressed as polynomial of 
D3 is 

(xiV^3 + ^2) D3 + 2x2V^2£>3 + (xiX3 - xl) . 

Using Equation (Al), we can simplify the notation, and express 
this determinant as a polynomial of lOfar, which is the input 
variable that will be updated at each recalibration. 

Now we can express (A-^CA)~^A-^C as 



J_ r X3 + ^3Dj -X2 - i^2D3 

D I -X2 - V^2£>3 Xl 



qPi C2P2 C3P3I 
qDi C2D2 C3D3J ' 



where the first two factor are the inverse of A^CA and the third 
term in the result of A^C. 

Performing the algebra, we obtain the following matrix 
expression: 



(X3 + ^3Dj)ciPi 



(X3 + ^3Dj)C2P2 



3^ L ^ _ ~ V3^3. 

-(X2 + 1^2^3)51 A _ -(X2 + V^2^3)f2^2 _ 

XlCiDi - (X2 + '^2D3)CiPi X1C2D2 - (X2 + V^2^^3)Q^2 



where we have omitted the third column of the matrix, since it will 
be canceled by the zero in the third of row b. 



we obtain the solution for the 8p and 8 pi 



1 A, 



^P = -^ir (^^3 + i^3Dj)ClPl - (X2 + i^2D3)CiDi) 
+ ^^^^ V^3D3)C2^2 - (X2 + f2lh)C2b2) 



and 



8d- 



© Ta 



(xiCiDi -(X2 + V^2D3)ciPi) 



where we see that D3 appears in the numerator oi&p with degree 2 
and in the numerator of 5 ^ with degree 1 . In both cases, the denom- 
inator is given by the determinant S), which is another polynomial 
of degree 2. Therefore, we can express each parameter as a ratio of 
two polynomials. 

^ _ «2lOfjjj. + CCiIOfar + ao 

/S2l02^ + AIOfar + y6o' 
^ ^ KllOfar + KO 

AIO^„ + ySiIOfar + ;So' 

In consequence, the adaptive calibration algorithm requires to 
pre-compute 8 coefficients. The expression for each coefficient is 
detailed next: 





(X1X3 - x|)> 




Pl = 


2X2 V^2, 




h = 


(xiV^3 + V^l), 






J^(X2ClDi-X3ClPl) + 

^ a 


|^(X3ClPl 


ai = 


T^fiCibi - ^f3C2b2, 




0L2 = 


T^f}C2P2 - T^fiClPu 

^ a 




yo = 


f^(XlClDl -X2ClPl) + 

^ a 


7"(XlC2D2 

^ e 


n = 


T^f2C2P2 - T^f2ClPu 





where x and \lr are given by 



Xl = CiPf + C2PI + C3P3^ 
X2 = CiPiDi + C2P2£>2, 

X3 = cib\ + C2bl, 

V^2 = ^^3^3, 
^3 = C3. 
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